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Abstract 

Many agent based simulation approaches have been proposed for pedestrian 
flow. As such models are applied e.g. in evacuation studies, the quality 
and reliability of such models is of vital interest. Pedestrian trajectories are 
functional data and thus functional principal component analysis is a natural 
tool to asses the quality of pedestrian flow models beyond average properties. 
In this article we conduct functional PCA for the trajectories of pedestrians 
passing through a bottleneck. In this way it is possible to asses the quality 
of the models not only on basis of average values but also by considering 
its fluctuations. We benchmark two agent based models of pedestrian flow 
against the experimental data using PCA average and stochastic features. 
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Functional PCA proves to be an efficient tool to detect deviation between 
simulation and experiment and to asses quality of pedestrian models. 
Keywords: pedestrian dynamics; statistical analysis; comparison with 
experiment; functional PCA; model quality 
PACS: 89.75-k Complex Systems, 50.40-a Stochastic Models 


1. Introduction 

Most of force-based models qualitatively describe the movement of crowds 
of pedestrians. Self-organization phenomena e.g., lane formations HUM, 
oscillations at bottlenecks clogging at exit doors 0 [29] etc., are re¬ 

produced. From a physical point of view it is of interest how simple model 
reproduce qualitatively self-organization phenomena of driven multi-particle 
systems. That contributes to a better understanding of the investigated 
systems and the essential interactions. In addition numerical simulations 
basing of these models are used to address safety related issues, concerning 
e.g. design and conception of escape routes in buildings [2B| [28] or optimal 
organization of mass events or public transport facilities (VISWalk |31j, Le¬ 
gion [32] . ...). For such utilization a thorough quantitative validation of the 
models is obligatory to ensure a reliable layout, dimensioning or evaluation 
of pedestrian facilities. In most known cases this is fulfilled by reproducing 
the fundamental diagram [23II51I2JI9J or measuring the flow through bottle¬ 
necks 0 [2;, [13]. An overview of quantitative validation of models by means 
of the fundamental diagram is given in [2Tj. On one hand, the common point 
between these quantitative methods is the fact that they are based on calcu¬ 
lating specific traffic quantities, e.g. density, flow and velocity. On the other 
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hand, these measurements are performed based on locally averaged values 
over time or space. [30] and ra provide examples how the measurement 
methods could influence the resulting empirical relations of such granular 
and heterogeneous systems of finite size. The differences between the mea¬ 
surement methods suggest that important information on the system may 
be lost during the measurement process. Moreover state of the art models 
describe pedestrian dynamics on a more detailed level by simulating trajec¬ 
tories of every single pedestrian allowing in principle a validation method 
assessing average pedestrian or traffic fiow behavior, but also accounting for 
the amount and typical nature of fluctuation around this average. 

A first methodology based on exploiting information of individual tra¬ 
jectories was introduced in [10] to calibrate the social force model. While 
one pedestrian was moved according to the model the others were moved 
according to real trajectories. By means of an evolutionary algorithm the 
deviations of the resulting trajectories from the experimental ones was used 
to calibrate the parameters of the model. But this approach doesn’t allow 
an assessment of the quality of a model. 

While an abundance of agent-based models in the field of pedestrian and 
traffic dynamics were developed in the last years mm the question of sys¬ 
tematic comparison of experimental evidence and model generated results 
has not caught the same attention. This would however be important for 
the ranking of models into more or less adequate ones. As argued above 
methodology of the evaluation should provide a comparison of model results 
and empirical data corresponding to the level of detail of the model. It is 
desirable that such a validation method should not only be able to asses 
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average pedestrian or traffic flow behavior, but also account for the amount 
and typical nature of fluctuation around this average. 

Among the difficulties in this validation process is the fact that in agent- 
based pedestrian or traffic flow data is functional, i.e. to each individual we 
associate data in the infinite dimensional space of trajectories x(t). The 
adequate statistical approach for the study of pedestrian or traffic flow data 
is thus the well established method of functional data analysis [20]. In this 
method, the variation in the trajectories of different agents is interpreted 
as random fluctuations. Thus, the measured or simulated trajectories are 
interpreted as realizations of some stochastic process X(t) G M 2 , where t 
stands for a time parameter and X(t) = X(t,u ) tacitly depends on some 
random parameter u j from a probability space (Q,A,P). For more details 
the reader is conferred to |lj. Although there are infinitely many trajectories 
available for an agent to move from point A to point B, it often turns out that 
a few typical modes of variation around the average movement are responsible 
for the bulk of fluctuation of trajectories between different individuals. As a 
classical method in the analysis of functional data, the functional principal 
component analysis (PCA) is the standard method to find and analyze these 
typical variations. 

The scope of this article is to use functional PCA analysis to study the 
performance of agent-based models of pedestrian motion with respect to 
experimental data. In order to demonstrate the methodological approach, 
two models - social force model (SFM) [13] and generalized centrifugal force 
model (GCFM) [2] - are used to simulate pedestrian movement through a 
bottleneck of the same dimensions. In the following we apply functional PCA 
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using the open source extension fda by Ramsey, Hooker and Graves [T9] to 
conduct the analysis. We present the results and give a detailed comparison 
of average values for locations and velocities and their respective principal 
components. For the latter we separately compare strength, distribution of 
total variation, and morphology of principal components. 

We show that functional PGA in fact can be used to make statistically sig¬ 
nificant statements about model quality. Functional PGA reveals significant 
deviation between both models and the experiment already on the level of 
average values. While the morphology of principal components for locations 
is more or less adequately represented by both models, there are significant 
deviations in the strength of fluctuations around the mean behavior with 
the GCFM model underestimating the experimentally observed fluctuations 
while the SFM mostly overestimates fluctuation strength. These empirical 
observations can be confirmed with statistical testing for significance using 
the PCA-bootstrap methodology IHi- 

In this article, for the fist time we combine functional PCA in the sense 
of pt| with the bootstrapping of scores in order to calculate the fluctuations 
of specific statistics that describe and distinguish characteristic features of 
fluctuations of individual pedestrian behavior in a crowd. Also on the PCA- 
side, benchmarking and testing with specific statistics evaluated in functional 
PCA is a new strategy, to the best of our knowledge. 

The article is organized as follows. In section [2] we review the pedestrian 
flow experiment [23] as the benchmark case for this study. Section [3] gives 
a brief account on the SFM and the GCFM model. In Section [4] reviews 
the functional PCA and its numerical implementation. Section [5] is the main 
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part of this article. After some introductory remarks on data formatting and 
smoothing (Subsection |5.1 ), we compare average data for x and y position 


data (Subsection 5.2) and velocities directed in the main direction of motion, 
which is the x-direction. We then compare fluctuations strength via PCA 
eigenvalues (Subsection |5.3 ) and morphology for the first PCA harmonics 
for x- and y- positions and x-velocities (Subsection 5.4). Section [6] presents 
the PCA-bootstrap approach in the context of spline-based PCA (Subsec¬ 


tion 6.1) and applies this to total variation and Gini index (Subsection 6.2) 
as well as the L 2 -distance of the average trajectories and the Hilbert Schmid 
distance of the empirical correlation functions (Subsection |6.3[ ). In Section [7] 
we summarize or findings and give some conclusions on model quality in the 
specific case and general applicability of functional PCA in the given context. 


2. Experiment 


In this work we use as a reference the experimental data extracted from 
the experiment [23j, that was performed in 2006 in the wardroom of the 
“Bergische Kaserne Diisseldorf”. See Fig. [TJ 

A waiting area was used to distribute the attendees before the start of 
each run of the experiment. For simulation purposes we enlarge the area 
of the set-up by an extra room of length e. This is necessary to take into 
consideration the effects of pedestrians that leaved the bottleneck on the 
pedestrians still in the system. 

The flow through the bottleneck is measured as follows: 


J = 


N 

At’ 


( 1 ) 
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Figure 1: The simulation set-up: pedestrians start from the shaded area and move through 
the bottleneck (7 = 4 m, h = 4.5 m, b = 6 nr and w = 0.9 m). An adjacent area of length 
e = 2.5 m is added to consider the backward effect of leaving pedestrians on those still in 
the bottleneck. 

with N the number of pedestrians and At = ti ast — 7fi rst the time gap between 
the first and the last pedestrian passing the bottleneck at the measurement 
line. 

3. Models 

Force-based models describe the movement of pedestrians as a superposi¬ 
tion of forces. Given the state variables of pedestrian i at time t (t), vt(t)) 
and considering Newton’s second law of dynamics the state of each pedestrian 
i is defined by: 


= 52j£+7f, 

(2) 

W 


at 

(3) 


where /b denotes a repulsive force acting from the j^-pedestrian on the i th - 
pedestrian, f[ w is a repulsive force emerging from borders, walls etc. and ff 
is a driven force, m, is the mass of pedestrian i. 


7 






The superposition of the forces reflect the fact that pedestrians move 
towards a certain point in space (e.g. an exit) and meanwhile try to avoid 
collisions with each other or with walls and objects. 

The driving force ff models, at low densities, an exponential acceleration 
towards a desired speed vq: The following expression [18J is used: 
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r 


(4) 

with a relaxation time r typically equal to 0.5 s, and a desired direction ef 
of pedestrian i. 

The repulsive force between pedestrians ff is defined differently from one 
model to another 

In this work we study a variation of the SFM and the GCFM. Both 
models are microscopic and continuous in space. In the GCFM the agents 
have an elliptical form with velocity-dependent semi-axes, whereas the shape 
of agents in the SFM is circular. In the general case, the distance || dij || is 
defined as the distance between the borders of the ellipses i and j along a line 
connecting their centers. See Fig. [2] For the SFM the semi-axis orthogonal 
to the movement direction is equal to the other semi-axis in the direction of 
movement. For simplicity we write to denote the norm of the vector d tJ . 


3.1. The social force model (SFM) 

The SFM as originally published by Molnar [Hj describes the movement 
of circular agents as superposition of different factors e.g. influence of neigh¬ 
boring pedestrians, walls, attractions and groups. In this work we reduce 
the complexity of the model to a minimum, considering only the influence of 
pedestrians and walls and assuming only circular potentials. 




Figure 2: The effective distance (A (? of two pedestrians represented by two ellipses. 


The repulsive force in the SFM between agent i and j is dehned as 


jf. = —rm k i:j A exp () eij • 


with 


and 


-> 


- I I '! ■ ^ ^ I I _ nr* _ /y 

LLij || Jj j || l l I j) 




OC j * 3 ^?. 
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(5) 


( 6 ) 
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with the parameters A and B the strength and the range of the force are ad¬ 
justed. The limited vision of pedestrians (180°) is modeled by the coefficient 


u . 


k'ij © ( Vi . Cjj ) . 


( 8 ) 


©(■) is the Heaviside function. The repulsive force between pedestrians and 
static objects is dehned similarly to (J5|. 


3.2. The generalized centrifugal force model (GCFM) 

The repulsive force in the GCFM is inversely proportional to the distance 
of two ellipses representing moving pedestrians i and j and depends on their 
relative velocity: 
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where Vij = — Vj) ■ is the relative velocity. The use of the 

Heaviside function ©(•) ensures, that faster pedestrians are not effected by 
slower pedestrians. By means of the parameter a the strength of the force 
can be adjusted. As mentioned earlier in the GCFM the space requirement 
in the direction of movement is modeled by the semi-axis 


CL Omin T TaVii 


( 10 ) 


with two parameters a m i n and r a , whereas the lateral swaying of pedestrians 
is modeled by the semi-axis 


b = b n 




h ^ 

^min ) 

Vo 


( 11 ) 


3.3. Model parameters 

As mentioned earlier the original SFM includes several forces e.g. phys¬ 
ical contact forces and attractive forces. For our purpose we use a simpli¬ 


fied version of the SFM as presented in Sec. 3T We choose A = 5 N for 
pedestrian-pedestrian interactions (|5]) and A = 7 N for pedestrian-wall in¬ 
teractions. The range of the function defined by the parameter B in ([5]) 
was chosen to be 0.08 m for pedestrian-pedestrian interactions and 0.05 m 
for pedestrian-wall interactions. The parameter a in ([9]) is set to 0.2 for 
pedestrian-pedestrian interactions and 0.33 for pedestrian-wall interactions. 
The desired speed v 0 is set to /a — 1.1 m/s. For simplicity we set for both 
models m, = 1 Kg. Table [T| gives a resume of the parameters used. 
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Parameter Equation Value 


^4ped 

A wall 

-®ped 

-Bwall 

C^ped 

®wall 

r 

v 0 

m 

a T 

®min 

Vnin 

^max 


( 5 ) 


Similar to 


( 5 ) 


5 N 
7 N 


0 

Similar to (5) 

S 

Similar to (9) 


(4) 

(4) 

fl 

( 10 ) 

( 10 ) 

(H) 

(ID 


0.08 m 
0.05 m 
0.2 
0.33 
0.5 s 
1.1 m/s 
1 Kg 
0.12 s 
0.15 m 
0.15 m 
0.2 m 


Table 1: Parameter values in simulations with both GCFM and SFM. 


The values chosen in Tab. [l] differ from the values published in other 
works nacg. Our choice of the above mentioned values is supported by 
qualitative reasons, ensuring minimal overlapping among pedestrians, as well 
by quantitative consideration of the flow through the bottleneck. See Fig. [3j 
For safety relevant simulations a careful calibration of the used models is 
needed. Having calibrated two different models based on usual qualitative 
and quantitative criteria, we strive to apply a new technique to assert the 
goodness of the investigated models and verify whether the aforementioned 
validation is sufficient to ensure a trustworthy and safe use of the produced 
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Figure 3: The flow through the bottleneck measured at the middle of the corridor after 
the entrance to the bottleneck. The empirical value is a reference value for the calibration 
of the GCFM and the SFM. 

simulations. 

4. Functional PCA: Foundations 

4-1. What is functional PCA? 

In this section we give some details of functional PCA following [20 ], 
The principal component analysis uses the principal axis transformation for 
multivariate, correlated numerical data using the (empirical) covariance in¬ 
formation between the single random variables. Eigenvalues then sort the 
importance of the single eigenvectors (also called harmonics or modes) ac¬ 
cording to the variance. 

This concept needs to be adapted to the case where the observed data 
from n individuals are functions - as it is the case with the trajectories of 
pedestrians. The variability of the data can still be described in with the 
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eigenvalues and eigenvectors of the covariance function seen as an operator 
on the function space of square integrablc functions. Given the stochastic 
process of random trajectories X(t), with t G (0 ,L), the covariance function 
is defined as 


C(s,t) = E [(X{s) - E[X(s)])(X(t) - E[X(t)})] , (12) 

with E the expected value with respect to the underlying probability space. 
This covariance function needs to be estimated out of the data Xj(t) 

1 n 

C(s,t) = ^—^J2(x j (s)-x(s))(x j (t)-x(t)), (13) 

3 =1 

with Xj(t) the j-th observation is one realization of the random process X(t) 
and x(t) = y EJ=i Xj(t). 

In the following we assume that average values have already been removed 
from the stochastic signal, i.e. we consider transformed random quantities 
X(t) — > X(t) — E[X(t)] with estimated observations Xi(t) — x(t). The eigen¬ 
values of C(s,t ) can then be calculated after solving the follow eigenvalue 
equation 

L 

C(s,t)£(t)dt = \£(s). (14) 

This results in a set of eigenvalues Ai > A 2 > • • • > 0 and corresponding 
eigenfunctions ^(s). These eigenfunctions are orthonormal £i(t)£j(t) dt = 
Sij, where 5,^ = 1 for % = j and zero otherwise. The eigenfunctions and 
eigenvalues can now approximately be determined from the observations Xj(t) 
by replacing C(s,t ) by its empirical counterpart C(s,t). 
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4-2. Numerical approximation 


The problem (14) is an infinite dimensional eigenvalue problem and its 
empirical counter part is potentially very high dimensional (of dimension 
n). A frequently used method to make this problem numerically tractable 
is to project the covariance matrix on the space spanned by some finite 
basis, - e.g. a sufficiently fine B-spline or Fourier basis. Then one solves for 
the eigenvalues and functions in the given finite dimensional space of basis 
functions. Therefore we approximate the observed functions Xi(t) with a 
suitable linear combination of basis functions 


K 


Xi (() = ,(t) «■ Xi(t) = C#(t), 


(15) 


k= 1 


with basis function vector = (<hi(t),..., $x(t)) T and coefficient ma¬ 
trix C = (cj k) i.n obtained e.g. by orthogonal projection of xAt) to the 

J ’ k=l,...,K 

space spanned by the basis functions and a subsequent basis decomposition, 
in which case C = W _1 v.,- with W t j = ($*, Qj) = ds and 

Cj,k = (x j} **> = fo Xj(t)$k(t) dt. Here some numerical quadrature may be 
employed for the integrals involved in the definition of Vj^, whereas in most 
cases analytic formulae are available for Wij. This projection method implies 
that the covariance function can be approximated by 


C(s,t) = —-— < h(s) T C T ’C < f ) (f). (16) 

n — 1 

Here we use C for the matrix of coefficients of Xj(t ) —x(t) with respect to the 
basis <f>(£), namely Cj.k = Cj.k — ^ Y^j=\ c j,fc■ Now we expand the eigenfunction 
with the same basis functions to a good approximation: 

K 

£(*) = £>$*(*) = $(*) r b. (17) 

fe =i 
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The approximate eigen value equation can be written as 


L 


L 


1 


C(s, t)£(t) dt 


$(s) T C T CWb = A<F(s) T b = Af(s). (18) 


n 


leading to the eigenvalue equation C T CWb = Ab W 1 / 2 C T CW 1 / 2 u = 
Au, b = W 1//2 u, which can be solved numerically. The result is a number 
of eigenvalues Ai > A 2 > • • • > 0 and coefficient vectors b, for approximate 
principal components £*(£) = d>(t) r bj, for i = 1,2,... ,n — 1, which is the 
maximal rank of C. 

4-3. Statistics of the eigenvalues 

In this subsection, we discuss how to reduce the information from the 
set of eigenvalues to a few significant characteristics. In particular we will 
focus on code figures that measure the strength of fluctuations and their 
concentration to a few, active modes. 

The eigenvalue A* represents the strength of fluctuations in the respective 
mode of characteristic shape £,;(£). The relative strength p t of the variation in 
the mode (t) and the cumulative relative strength Lj up to the j'-th mode 
(j (t) is given by 



(19) 


Two quantities that can be derived form the eigenvalues of the PCA are 
of special interest: First, the total variation strength is simply the sum of all 
eigenvalues A = Xq=i A? whereas the Gini index is a measure of concentration 
that is build from the Lorenz curve quantities Lj via G = AG 
Geometrically, the Gini index measures the area between the diagonal and 
the Lorentz curve, cf. e.g. Figure [8] in the right panel. It is normalized such 
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that it takes the value one if only one mode is active and takes the value 
zero when all modes are equally activated Ai = A 2 = • • • = A n . Note that the 
order of A j is descending in contrast to the usual definition of the Gini index, 
where the order is ascending. As an alternative, one could also consider the 
entropy of the distribution of the total activity to the single modes. The 
result of the observation however remain largely unchanged. 

4-4- Deviation measures 

In this section we derive some quantities that can be used to measure 
the distance between one set of functional data and another such data set. 
In particular we will utilize these distances for benchmarking models with 
respect to their distance to the experiment. Two distance measures will be 
employed in the following: First, the mean quadratic deviation between the 
average trajectories of the model on the one hand and the experimental data 
on the other. Secondly, we consider the Hilbert-Schmidt norm between the 
respective empirical covariance functions as a measure of the distance of the 
fluctuation behavior of the experiment and the simulation. In the following 
we work with the data after projection to a finite spline basis <£>/.(£). 

We start with the mean quadratic difference in the average behavior. The 
mean of the observed function ay(f) is: 

x(t) = = J2 C k®k(t). ( 20 ) 

j= 1 k =1 ' j=1 k =1 

=c k 

The mean quadratic distance, the squared L 2 norm, of the difference between 
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x(t) = Efc=i Ck$k(t) and y(t) = Ef=i C k®k(t) is: 


K 


I< 


\m-vmi =(£(c* - ci)*»(t), - c k )Mt)) = (c-c') T w(c-c') 

\fe=i fc=i / 

( 21 ) 

We now derive formulae for measuring the distance between experiment and 
simulation in the covariance structure. Let D(s,t ) = C(s,t ) — C'(s,t) = 
Ejti be the difference of covariance functions. 


—dk,l 

The Hilbert-Schmidt norm of D(s,t ) is: 

fL rL , ^ 2 




^ ^ dsdt = Tr((DW) x DW). 


( 22 ) 


Here Tr(H) stands for the trace of the matrix H and D is the matrix with 
entries d h k- 


5. PCA Results 

In this section we show that functional PCA is a useful tool for detailed 
validation of models for pedestrian dynamics. Ideally, the variability in the 
data can be described with the help of the PCA with a few principal com¬ 
ponents. These main components can be interpreted as the schemes for the 
deviation of individual trajectories from the mean flow. This allows a com¬ 
parison of simulated and experimental data beyond averaged flow features. 
Therefore, we apply the PCA to the experimental and simulated data from 
two models SFM and GCFM and compare the results. Here we apply the 
PCA for x and y coordinates over time separately, as this approach is some¬ 
what more accessible to the interpretation. For the alternative approach of 


17 



jointly analysing x and y trajectories and a discussion of the pros and cons 
of both approaches, see mm- 

The analysis of the data is based on the R package fda developed by J.O. 
Ramsay et al. [12| with some minor extensions by the authors. 


5.1. Preparation of the Data 

The pedestrian trajectory data in the experiment is recorded electroni¬ 
cally with video tracking at the rate of 25 frames per second. A total number 
of 149 trajectories has been recorded. Likewise, the SFM and GCFM models 
have been simulated with 25 time steps per second using Euler integration 
with a time step At = 0.01 s. For both models, a total of 149 trajectories 
have been simulated. See the trajectories in Figure |4j 

SFM EXP GCFM 



-300 -200 -100 6 100 200 

x / cm 


-300 -200 -100 0 100 200 

x / cm 


-300 -200 -too 6 100 200 

x/cm 


Figure 4: XY plots of pedestrian trajectories generated by the SFM model (left), experi¬ 
ment (middle) and GCFM model (right). 

For the analysis, the pedestrian motion has to be stationary, i.e. all agents 
move under the same conditions. 

Furthermore, the considered time interval has to be the same for all 
agents. Therefore we account only to the pedestrians, who need more than 
12 seconds to reach the exit. Furthermore, the trajectory is tracked only two 
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seconds after the passage of the bottleneck entrance. The analysis of the 
experimental data and the models thus is based on trajectories in individual 
time intervals that range from —12 s before passage through the bottleneck 
by the individual to +2s afterwards. This makes a total time of 14 s. In 
order to avoid negative time values, we start each pedestrian trajectory at 
time t = Os such that passage through the door for each individual occurs 
at t — 12 s, exactly. From now on we work with this time scale. Figure [5] 
visualizes the formatting steps. After reformatting, a total of 118 pedestrian 
trajectories were available for the experiment, 121 for the SFM and also 121 
for the GCFM model, respectively. 





Figure 5: Plots of ^-coordinates of pedestrians (experimental data): raw data (left), sta¬ 
tionary data (middle) and with individual time in the interval [0,14] s (right). One Frame 
in the first two panels corresponds to 1/25 s. 

The experimental data contain the swaying caused by the bipedal loco¬ 
motion of pedestrians in combination with the tracking of markers on the 
head. But the SFM as well as the GCFM model the movement of the centre 
of mass neglecting the bipedal locomotion, which produce swaying-free tra¬ 
jectories. Therefore, we smooth the data before the analysis in order to filter 
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out the lateral swaying. It turns out that the regression with a B-splinc basis 
containing 10 elements with nodes equally distributed over the underlying 
time interval [0,14] effectively removes swaying while properly reproducing 
the other features of the individual’s trajectories, see Figure [6j 


smooting of an experimental trajectory 



Figure 6: Smoothing of a experimental trajectory (dotted blue) with a B-spline basis of 
dimension 10 (solid red). 

5.2. Average trajectories 

As PCA components describe variation around some mean value, it is 
essential to analyse average functions x(t) and y(t). 

Figure [7] shows the mean functions x(t) and y(t) of the x- and ^-components 
for the experiment and the models. 

When we examine the ^-component, we identify clearly discrepancies be¬ 
tween the experiment and both models. The average pedestrian in the ex¬ 
periment shows a nearly linear progress to reach the exit. The acceleration 
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Figure 7: Average curves for position vs time for the SFM (left), experiment (middle) and 
GCFM (right): x-ccordinate (top) and y-coordinate (right). 

after passing the bottleneck, i.e. the increase in the slope, is modest. In con¬ 
trast, both SFM and GCFM show a slower progress of the average pedestrian 
through the crowd and a much more pronounced acceleration after the pas¬ 
sage of the bottleneck. While the latter deviation from experiment is about 
the same for both models, the underestimation in the slope from the exper¬ 
imentally observed value is bigger for the GCFM model. Thus both models 
overestimate the dwell time indicating a missing anticipation and coopera¬ 
tion of the modelled pedestrians. However the SFM produces this effect by 
a lesser amount. 

The y-component is described by both models in a satisfactory manner, 
as trend lines only move at a scale of a few centimeters from the center of the 
bottleneck. At least for the simulated data this is due to the left hand - right 
hand reflection symmetry of the agents in both models and the (approximate) 
symmetry of initial positions with respect to the y = 0 axis, i.e. the center line 
through the bottleneck. In the experiment, a certain asymmetric behavior is 
visible for the mean y-position of the trajectories over time. In average, the 
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pedestrians approach the bottleneck coming slightly from the left seen from 
the direction of progress. Interestingly, this asymmetry can not be traced 
back to the initial conditions, as these are the same for the experimental and 
the simulated trajectories. 

This behavior is absent in both models, which have left-right symmetries 
in their respective constituting equations. 

5.3. PC A eigenvalues 

Having analyzed the average behavior, we now turn to the question, how 
well the models describe fluctuations in pedestrian data around the averages. 
We start with the absolute strength of PCA variability, which is represented 
by the PCA eigenvalues Aj, i — 1,..., 10, as we are using a 10 dimensional 
spline basis. At the same time, we also consider the cumulative relative 
strength o 3 in order to measure the concentration or dispersion of variability 
in experimental or the simulated data. We first consider PCA mode strength 
for x-position over time as given in Figure [8} These modes describe typical 
patterns of pedestrians lagging behind and being in front of the trajectory of 
the average pedestrian. The SFM overestimates the total amount of statisti¬ 
cal deviation in x-position from the average x-position by approximately 12% 
as compared with the experiment. Also, concentration of variability in the 
first mode is slightly lower than in the experiment. In the GCFM, the total 
level of x-position variation is underestimated by 37% of the total variation. 
The relative concentration in the fist mode is higher as in the experimental 
data by an amount comparable to the SFM, but in the opposite direction. 
For the values of total variation and Gini indices confer Table [2j 

In Figure [9] the PCA-modes for the statistical ^-fluctuation around the 
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Fluctuation strength of PCA-modes in x-direction 




Harmonics Harmonics Harmonics 


Harmonics 


Figure 8: Left: Barplot of absolute fluctuation strength (eigenvalues) of the 10 PCA-modes 
(harmonics) for ^-position over time for the SFM-model, Experiment and GCFM-model. 
Right: Cumulative relative strength of PCA-modes over all 10 harmonics. 

average y-position (essentially y — 0) is displayed. The experiment and both 
simulations all show that basically only one mode is active representing the 
axially symmetric shape of the jammed area in front of the bottleneck. The 
size of this area is underestimated by both models. The GCFM predicts a 
pronouncedly reduced area in the ^-direction covered by trajectories pass¬ 
ing the bottleneck in the next 12 seconds, showing a total ^/-variation of 
~ 1/3(33.6%) of the experimental data. The same figure of underestima¬ 
tion of y -variation for the SFM compared with experiment is 57%. Total 
variations and Gini indices can again be found in Table [2j 

5.4■ PC A modes 

Figure [TO] shows the principal fluctuation components of the ^-position for 
the experiment and the models. From the aforementioned eigenvalue analysis 


23 

















SFM 


EXP 


GCFM 


Fluctuation strength of PCA-modes in y-direction 



Harmonics Harmonics Harmonics 


Harmonics 


Figure 9: Left: Barplot of absolute fluctuation strength (eigenvalues) of the 10 PCA-modes 
(harmonics) for y-position over time for the SFM-model, Experiment and GCFM-model. 
Right: Cumulative relative strength of PCA-modes over all 10 harmonics. 

(Figure [8]) we can conclude that fluctuations can be mainly described by the 
first three principal components. 

Let us now give an interpretation to the morphology of the PCA com¬ 
ponents. The first principal component describes the tolerance between the 
initial positions of the pedestrians 12 seconds before passing the bottleneck. 
Two pedestrians reach the bottleneck at the same reference time t — 12. 
Due to the fact that some pedestrian starts with a higher or lower ^-distance 
to the bottleneck than the the others, at t — 0, a statistical variation in 
x-positions occurs. This could be called “slipping-through-effect” because 
faster pedestrians find more favorable configurations of fellow pedestrians 
ahead which allows a faster passage through the crowd. 

The second and third principal component describe an effect which we can 
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Tot.Var 

SFM 

Experiment 

GCFM 

.x-position 

10863 

9550 

6835 

y-position 

35469 

61619 

20749 

Gini 

SFM 

Experiment 

GCFM 

.x-position 

0.83 

0.85 

0.87 

y-position 

0.89 

0.89 

0.89 


Table 2: Total variation (left) and Gini indices (right) for the eigenvalues of the PCA. 
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Figure 10: PCA components in for ar-position over time for the SFM model (left), experi¬ 
ment (middle) and GCFM model (right). The first three harmonics are displayed. 

associate to long stop and go behavior in different lanes in a traffic jam: One 
trajectory is temporary faster than the other, but afterwards it is the other 
way round. In the case of the experiment and the GCFM, the third principal 
component also shows different velocity patterns after the bottleneck. 

The morphological comparison of the experimental data with the SFM 
and GCFM shows that the points of intersection of the first two principal 
components are nearly at the same times. Thus both models reproduce the 
qualitative behavior of statistical fluctuations in the pedestrians ^-positions 
over time quite well. The main difference thus lies in the different activation 
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strength of the “slipping-through” mode. 
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Figure 11: PCA components in the y direction for the SFM model (left), experiment 
(middle) and GCFM model (right). The first three harmonics are displayed. 

Figure [TT] shows the PCA of the //-components for the experiment and the 
models. We observe that the variability of the data can be mainly described 
by the first principal component which represents the shape of the crowd in 
front of the door. The first principal component of both models describe the 
experiential data acceptably well. Also the higher modes are of quite similar 
shape, although they should be neglected since they hardly contribute to the 
total variation. 

5.5. Evaluation of deviation measures 

Lastly in this section, we want to compare the deviation measures of 
the respective simulation model with the experiment. The results are sum¬ 
marised in |3l 

The average squared distance of x- and y- coordinates of the trajectories 
as function over time is of the same order of magnitude for both models. 

A slight advantage can however be attributed to the SFM-model. This 
effect is even more pronounced in the Hilbert-Schmidt norm that measures 
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L 2 -norm 

EXP-GCFM 

EXP-SFM 

^-position 

169.8 

166.0 

^/-position 

15.34 

13.20 

HS-norm 

EXP-GCFM 

EXP-SFM 

^-position 

2291 

1282 

^-position 

4449 

2880 


Table 3: Deviations between experimental data and models data using the L 2 -norm (left) 
Hilbert-Schmidt norm (right). 

the distance to the experiment in the fluctuation structure of measured and 
simulated data, 

6. Statistical inference based on the bootstrap 

In the previous section, we evaluated the total variation and the Gini co¬ 
efficient or deviation measures for the average behavior and the fluctuation 
structure (L 2 - and HS-norms, respectively) in order to compare simulated 
and experimental data. This descriptive approach however leaves open the 
question, to which extent these findings depend on the intrinsic stochastic 
nature of pedestrian trajectories and to which extent they are due to struc¬ 
tural differences between simulated agents in the models and real pedestrians 
observed in the experiment. In the present section, we describe and apply 
a simulation-based test procedure in order to clarify, to what extent the 
observed differences between models and experiment are statistically signifi¬ 
cant. 
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6.1. Bootstrapping PC A scores 


As the basis of our statistical testing procedure, we use the bootstrap over 
the matrix of principal components from HE]- We now shortly describe the 
bootstrap approach. Given, e.g., the i-th a;-value of the trajectory over time, 
Xi(t), the score of this trajectory with respect to the principal component 
£j(t) is 





I< 


K 


k=1 


y: (ci,i — Ci) yy 

(A r Wb T ) y . 


(23) 


By construction, the scores sff and sf -, are (linearly) uncorrelated for j ^ j'. 
Neglecting potential higher order correlations, we construct a virtual boot¬ 
strap sample from the scores of the experimental data by drawing with re¬ 
placement, for i,j fixed, s G’ boot) from the N samples sjy of the original scores 
with respect to the j-the principal component £j(t). Doing this independently 
for i = 1,... ,N and j = 1,... ,K (remember, in our case N = 110 is the 
number of experimental pedestrian trajectories and K = 10 the number of 
principal components) we obtain the N x K bootstrap score matrix sG> boot ). 
The corresponding bootstrapped trajectories then are 


xf oot \t) = [ s G. bo °t)$(i)] i + x{t), i = 1,... TV. (24) 


Figure [12j shows the .x-coordinates plots of pedestrian trajectories by boot¬ 
strapped and experimental data. 

With this virtual data set, the PGA analysis is then repeated. In partic¬ 
ular, we obtain bootstrapped quantities for total variation and Gini index, 




0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 

frames s/25 frames s/25 


Figure 12: Plots of ^-coordinates of pedestrians by bootstrapped data (left) and experi¬ 
mental data (right). 

as well as distance measures for the average behavior of the actual experi¬ 
ment and its virtual bootstrap replica. This entire process is then repeated 
a sufficiently high number of times, such that p-values in the range of usual 
significance levels «1 — 5% can safely be determined. Here we generate 10 4 
bootstrap samples, each containing N = 118 virtual trajectories, and thereby 
obtain a simulated distribution for each of the aforementioned quantities. 

The same iteration is repeated for the y-coordinate and the ^-coordinate 
of the velocity v x . 

For statistical testing, we generate two-sided confidence intervals for the 
total variation and the Gini coefficient and left open confidence intervals for 
the distance measures based on the empirical distributions of the respective 
quantities. If the related quantities for the SFM and GCFM model are not 
contained in these confidence regions, we consider this as a positive test result 
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for a deviation between experiment and model. 


6.2. Testing Gini indices and total variations 

One of the advantages of using this bootstrap technique is to have the 
opportunity to examine the distributions of Gini indices and the total vari¬ 
ations. We compute for every bootstrap sample the Gini indices and total 
variations by the experimental data for x-coordinates, ^-coordinates and re¬ 
veled ties. Afterwards, we are able to compute their empirical cumulative 


distribution functions (ECDF). Figure 13 shows the ECDF of Gini indices 
and total variations of x-coordinates by bootstrapped experimental data. 
The blue lines show the values of Gini index and total variation by orig¬ 
inal experimental data. The corresponding p-values, i.e. the critical level 
of statistical significance where the difference between model and experi¬ 
ment becomes significant, are calculated on the basis of two-sided confidence 
regions of the bootstrapped distribution. The values are summarized in 
Table |d} Note that p-values below 10~ 3 become numerically unreliable for 
10 4 bootstrap samples and are set to zero. 

The p-values in Table [4] show that statistical testing reveals significant 
differences between experiment and model with respect to several Gini in¬ 
dices and total variations. Only the Gini index of the GCFM and the total 
variation of the SFM do not produce significant differences (significance level 
5%). 


6.3. Tests based on deviation measures 

We are now interested to measure the deviations between bootstrapped 
and experimental data. Firstly we compute for every bootstrap sample the 
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Figure 13: ECDF of Gini indices (left) and ECDF of total variation (right) from boot¬ 
strapped experimental data. Green, grey and blue vertical lines mark the values for the 
SFM, experiment and GCFM, respectively. 

L 2 norm of the difference between the mean trajectory of the bootstrap 
sample based on the virtual experimental data and the mean trajectory of 
the experiment. In this way we obtain an empirical distribution of the L 2 - 
norm distance due to natural fluctuation inside the experiment. This is 
then compared with the L 2 -norm distance of average trajectories between 
the experiment and the SFM and GCFM model. The same procedure is 
also carries through for the Hilbert-Schmidt distance between the estimated 


correlation functions, see Figure 14 


Again, p -values are calculated as in the previous subsection, however this 
time we have to use one-sided regions of confidence for statistical testing. 
The p -values are displayed in Table [5] 

All statistical tests of the L 2 -norm distance between the average a:-values 


31 















p-values for Gini index 


EXP/GCFM 

EXP/SFM 

x-position 

0 

0.055 

//-position 

0.610 

0.007 

p-values for TotalVar 


EXP/GCFM 

EXP/SFM 

^-position 

0.034 

0.297 

//-position 

0 

0 


Table 4: The p-value of Gini index and total variations by bootstrapped experimental 
data. 

and v x -values of the experiments and the models are highly significant. We 
find thus a clear indication that model and experiment are statistically dis¬ 
tinguished. Not unexpectedly, the situation is different for the average of 
the //-values over time. Here, due to the axial symmetry of the experimen¬ 
tal set up, no major differences of the average //-trajectory can be observed. 
This also shows that the slight asymmetry in the average //-trajectory in the 
experimental data is not statistically significant. 

With regard to the fluctuation structure, the ^-position fluctuations en¬ 
coded by the empirical correlation function can not be easily distinguished 
between the SFM model and the experiment. The difference between the 
fluctuation of the x-trajectories of the GCFM and the experiment also shows 
a very marginal p-value of « 12%, which is not significant when compared 
to the usual 5%-level of significance. All other fluctuation structures signifi¬ 
cantly differ between the experiment and the models. 
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p-values for L 2 norm 


EXP/GCFM 

EXP/SFM 

x-position 

0 

0 

imposition 

0.512 

0.576 

p-values for HS norm 


EXP/GCFM 

EXP/SFM 

x-position 

0.122 

0.434 

^-position 

0 

0 


Table 5: The p-values for L 2 norm and Hilbert-Schmidt norm . 

7. Summary 

The functional PCA has been applied as a diagnostic tool to assess model 
quality for agent-based simulations of pedestrian flows with respect to av¬ 
erage behavior and beyond. Here we applied it to experimentally measured 
pedestrian trajectories passing through a bottleneck and agent trajectories 
simulated by two different force-based models. Both models are a-priori cal¬ 
ibrated to satisfy qualitative and quantitative criteria. 

Already in the analysis of mean flow behavior, the PCA reveals consid¬ 
erable and statistically significant deviation of both models from the exper¬ 
iment. In the x-direction, the SFM and the GCFM predict slower progress 
and lower velocities leading to a longer dwell time before the bottle neck, 
as experimentally observed, although total through flow is being produced 
correctly. This effect is more pronounced for the GCFM average behavior so 
that the SFM model reproduces the average behavior of the experiment - for 
the given set of parameters - in a relatively better way. 
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Hilbert-Schmidt norm 


Figure 14: The ECDF of the L 2 -norm difference of average x(t) values (left) and the 
Hilbert Schmidt (HS) norm difference between covariance functions of x(t) fluctuations of 
experimental and bootstrapped data. Green and blue vertical lines show L 2 - and HS-norm 
distance between the experiment SFM and GCFM models respectively. 

Coming to the statistical variations simulated in the SFM for ^-position, 
we find a quite reasonable match in the qualitative behavior (the PCA mode 
shapes) of the x- and y-position over time. A certain deviation in the quanti¬ 
tative variation strength is observed as well. Again the SFM predictions are 
a little closer, when measured in terms of the deviation in the total variation. 
Also, the concentration of variability to dominating modes is under estimated 
by the SFM and over estimated by the GCFM for the ^-variations, while in 
the velocity variations both models show a higher degree of concentration as 
compared with the experiment. 

Summarizing the PCA gives none of the models a clear “pass”, as the 
SFM and the GCFM both significantly differ from the experiment, although 
both models were validated quantitatively with respect to the experimental 
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flow through the bottleneck. The SFM however performs relatively better 
than the GCFM, which is mostly clue to a gradually better prediction of the 
average ^-positions and x-velocities. 

Caution is needed when applying these (and presumably also other) mod¬ 
els to evacuation studies, as the evidently do not capture all features of real 
life pedestrian flows, as has been shown by our functional data analysis. The 
overall picture of all the qualitative metrics derived from the PCA however 
slightly favours the SFM as the more accurate model over the GCFM, given 
our set of model parameters. 

In this case study, applying functional PCA for the first time to pedestrian 
flows, we have thus shown that it is in fact a useful tool to benchmark and 
statistically test agent based pedestrian flow models. Given the amount of 
deviation between experiment and model, it is certainly of interest to use 
this methodology for the future refinement of pedestrian flow models and for 
the critical assessment of models used by practitioners. 
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